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The  Effect  of  Envelope  Fover  on  the  Ability  to  Later&llze  3-Tone  Complexes  on 
the  Basis  of  Interaural  Envelope  Delays . 

Raymond  H.  Dye,  Mark  Stellmark  and  Andrew  Niemiec 

The  goal  of  this  study  was  to  assess  the  viability  of  envelope  detection 
as  the  front-end  of  high-frequency  binaural  processors.  Interaural  delay 
between  envelopes  of  high-frequency  carriers  is  a  potent  cue  for  lateralizing 
stimuli  presented  through  head  phones,  and  it  was  our  hope  to  characterize  the 
envelope  extraction  stage  by  presenting  waveforms  whose  effective  envelopes 
were  manipulated  without  altering  the  spectrum  of  the  signals.  This  was 
accomplished  by  manipulating  the  starting  phases  of  3  component  signals  such 
that  the  starting  phases  were  all  0  degrees,  0-45-0  degrees,  or  0-90-0 
degrees .  The  effective  envelopes  of  these  waveforms  are  maximal  when  all 
starting  phases  are  the  same  and  minimal  when  the  starting  phases  are  0-90-0. 
To  characterize  the  effective  envelope  of  these  waveforms,  the  standard 
deviation  of  the  instantaneous  power  in  units  of  average  power  were  computed 
(defined  as  z).  To  the  extent  that  envelopes  are  extracted  by  passing 
waveforms  through  a  nonlinearity  (half-wave  rectifier  or  square-law  device) 
followed  by  a  low-pass  filter,  then  one  should  expect  that  waveforms  of  higher 
z  should  be  more  easily  lateralized. 

Threshold  interaural  envelope  delays  were  measured  in  a  2 -AFC  task  as  a 
function  of  frequency  spacing  (Af-  25,  50,  100,  and  200  Hz)  with  the  carrier 
frequency  (fc)  set  to  4000  Hz.  The  level  of  each  component  was  50  dB  SPL,  and 
the  signal  duration  was  200  ms  with  10-ms  linear  rise/decay  times.  For 
comparison,  thresholds  were  also  measured  for  SAM  4000-Hz  tones  with  reduced 
depths  of  modulation  so  that  the  effects  of  decreasing  modulation  depth  and 
decreasing  envelope  power  could  be  directly  compared. 

Reducing  envelope  power  had  only  a  negligible  effect  on  threshold 
interaural  envelope  delays  for  Afs  of  200  and  100  Hz  but  a  large  and 
systematic  effect  at  50  Hz  and  especially  at  25  Hz.  For  three  of  the  five 
observers  the  0°-90o-0°,  Af-  25  Hz  condition  was  so  difficult  that  75%  correct 
could  not  be  reached  with  delays  as  large  as  2000  /is. 

In  order  to  determine  whether  envelope  manipulations  brought  about  by 
other  means  would  produce  similar  effects  on  lateralization  performance, 
threshold  interaural  envelope  delays  were  measured  for  SAM  4000-Hz  tones  that 
were  either  100%  modulated,  70%  modulated,  or  35%  modulated.  The  70%  and  35% 
modulated  waveforms  have  z's  that  are  equal  to  those  of  the  0-45-0  and  0-90-0 
conditions  respectively.  To  compare  performance  between  SAM  and  equal 
amplitude  components,  some  decision  had  to  be  reached  regarding  the  effective 
modualation  rate  of  the  complexes  generated  from  equal  amplitude  components. 
For  0-90-0,  the  modulation  frequency  is  twice  the  frequency  spacing,  but  the 
fact  that  the  intermediate  temporal  lobes  for  0-45-0  and  0-0-0  are  relatively 
small  leads  to  ambiguity  regarding  the  effective  modulation  frequency.  In 
comparing  100%  AM  and  the  0-0-0  condition,  the  agreement  is  good  if  one  treats 
the  effective  modulation  frequency  of  the  equal  amplitude  complexes  as  twice 
the  frequency  spacing.  For  comparisons  between  0-45-0  and  0-90-0  with  70%  and 
35%  amplitude  modulation  waveforms,  the  agreement  is  good  if  one  compares 
waveforms  having  the  same  frequency  spacing,  indicating  that  the  secondary 
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lobes  in  the  temporal  waveforms  of  the  0-45-0  and  0-90-0  waveforms  do  not 
double  the  effective  modulation  rate.  In  general,  the  results  obtained  are 
consistent  with  binaural  cross-correlation  of  the  outputs  of  envelope 
detectors. 


Discrimination  of  Interaural  Envelope  Delays:  The  Effect  of  Randomizing 
Component  Starting  Phase. 

Raymond  H.  Dye,  Andrew  Niemiec  and  Hark  Stellmack 

The  goal  of  this  study  was  to  examine  the  nature  of  envelope  extraction 
in  the  discrimination  of  high-frequency  waveforms  on  the  basis  of  envelope 
delay.  Threshold  interaural  envelope  delays  were  measured  for  3-  and  5- 
component  complexes  for  which  the  starting  phases  of  all  components  were 
either  0°  or  randomized  between  intervals  of  a  2-AFC  task.  The  carrier 
frequency  was  4  kHz  and  the  modulation  frequency  was  varied  from  25  to  500  Hz. 
The  results  showed  that  thresholds  were  greater  for  the  phase-randomized 
conditions  than  the  0-phase  conditions.  The  phase  effect  tended  to  diminish 
with  increasing  modulation  frequency  for  3— component  complexes  but  not  for  the 
5-component  complexes .  Sensitivity  to  envelope  delay  was  better  for  5- 
component  complexes  than  for  3-component  complexes  at  most  modulation 
frequencies.  In  general,  the  results  showed  superior  lateralization 
performance  for  conditions  in  which  the  envelope  fluctuations  were  greater,  a 
finding  that  is  consistent  with  models  of  high-frequency  binaural  processing 
that  include  envelope  extraction  prior  to  binaural  comparison. 


Lateralization  of  Narrowband  Noise  on  the  Basis  of  Envelope  Delay  as  a 
Function  of  Envelope  Power. 

Raymond  H .  Dye 

A  study  was  undertaken  to  ascertain  the  extent  to  which  envelope  power 
could  account  for  the  lateralizability  of  narrow  bands  of  noise.  Threshold 
interaural  envelope  delays  were  measured  in  a  2-AFC  task  for  narrow  bands  of 
noise  whose  center  frequency  was  fixed  at  4000  Hz.  The  bandwidth  of  the  noise 
was  set  to  50,  100,  or  200  Hz.  All  components  of  the  noise  were  equal 
amplitude,  and  different  noise  samples  were  generated  by  randomizing  the 
starting  phases  of  the  components.  For  this  study  the  waveforms  were 
classified  by  into  five  ranges  of  z-values  and  efforts  were  made  to  correlate 
performance  with  z.  Data  were  gathered  using  blocked  trials  (all  100  trials 
in  a  run  employing  waveforms  within  the  same  range  of  z's)  and  mixed  trials 
(all  range  of  z's  run  in  a  100-trial  block).  While  performance  with  high 
effective  envelope  depths  was  superior  to  that  with  low  envelope  depths  on  the 
average ,  it  appears  that  relationship  between  z  and  threshold  envelope  delays 
is  not  simple.  This  is  especially  true  of  the  data  gathered  with  a  50-Hz  wide 
band  of  noise .  Part  of  the  problem  concerns  the  fact  that  the  location  of  the 
major  peak  in  the  temporal  waveform  varies  from  one  waveform  to  the  next  when 
the  bandwidths  are  relatively  narrow.  Waveforms  with  early  peaks  appear  to  be 
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easier  to  lateralize  than  those  with  later  peaks.  As  such,  the  data  gathered 
at  a  single  restricted  range  of  z-values  tend  to  be  quite  variable.  Ue  are 
currently  examining  signal  generation  schemes  that  will  allow  us  to  generate 
waveforms  of  variable  effective  envelope  depth  without  altering  the  location 
of  the  prominent  peak  in  the  temporal  waveform.  Although  factors  other  than 
effective  envelope  depth  (as  measured  by  z)  are  important  for  the  sensitivity 
of  observers  to  envelope  delays,  it  is  clearly  the  case  that  some  sort  of 
envelope  extraction  mechanism  precedes  the  derivation  of  envelope  delay  by  the 
binaural  auditory  system  via  cross-correlation. 

Detection  of  Interaural  Envelope  Delays  in  High-Frequency  Targets  Presented 
Amongst  Diotic  Distractors. 

Raymond  H .  Dye 

A  study  was  undertaken  that  sought  to  gain  insight  into  the  processes  by 
which  multiple  high-frequency  carriers  are  lateralized  when  they  are  amplitude 
modulated.  Of  special  interest  were  potential  differences  in  performance  when 
the  target  carrier  (the  component  that  was  sinusoidally  amplitude-modulated 
such  that  the  envelope  presented  to  the  left  ear  led  the  one  that  was 
presented  to  the  right  ear)  and  neighboring  carriers  were  modulated  at  the 
same  (200  Hz)  versus  when  the  distractors  were  modulated  at  a  rate  different 
from  that  of  the  carrier.  The  interaurally  delayed  component  ("target")  was  a 
3000-Hz-carrier ,  100%  amplitude-modulated  by  a  200-Hz  sinusoid.  The 
distractors  were  additional  carriers  that  were  100%  amplitude-modulated  at  25, 
50,  100,  200,  or  400  Hz.  The  number  of  distractors  was  fixed  at  two,  and  the 
spacing  between  the  distractors  and  the  target  (Af)  was  varied  from  500  to 
1500  Hz.  For  comparison,  threshold  delays  were  measured  for  amplitude 
modulated  3000-Hz  targets  presented  in  isolation.  The  signals  were  200  ms  in 
duration,  gated  with  10  ms  rise-decay  times,  with  the  distractors  and  the 
target  gated  simultaneously  at  the  two  ears.  A  two-interval  task  was  used 
such  that  the  first  interval  always  presented  a  diotic  3000-Hz  carrier 
modulated  at  200  Hz  and  the  second  interval  presented  all  three  carriers.  On 
half  of  the  trials,  the  modulated  3000-Hz  target  was  interaurally  delayed  (to 
the  right  channel)  during  the  second  interval;  otherwise  it  was  diotic  (as 
were  the  distractors) . 

In  addition  to  assessing  the  performance  of  subjects,  they  were 
interrogated  regarding  the  listening  strategies  employed  on  a  particular  run 
of  trials.  The  subjects'  reports  indicate  that  (1)  targets  and  distractors 
that  are  modulated  at  the  same  frequency  tend  to  be  perceptually  fused  such 
that  the  entire  complex  sounds  shifted  during  dichotic  presentations,  even 
though  only  the  3000-Hz  carrier  is  delayed,  (2)  the  detection  of  delays 
presented  when  the  distractors  and  targets  are  modulated  at  different  rates  is 
accomplished  by  "hearing  out"  the  target  when  it  is  delayed  during  the  second 
interval  as  long  as  the  frequency  separation  between  the  target  and 
distractors  is  at  least  1000  Hz,  (3)  the  target  and  distractors  are  often 
perceptually  fused,  forming  single  intracranial  events,  when  the  target  and 
distractors  are  separated  by  only  500  Hz  even  though  the  modulation 
frequencies  of  the  target  and  distractors  might  differ  (25  Hz  vs.  200  Hz).  In 
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many  of  these  cases  under  (3),  the  task  can  be  accomplished  either  by  hearing 
out  the  delayed  component  or  fusing  the  complex. 

Objective  psychophysical  measures  of  performance  show  that  sensitivity  is 
quite  good  for  conditions  in  which  the  distractors  and  target  are  spectrally 
remote  and  modulated  at  different  rates,  with  performance  approaching  what  is 
found  for  targets  presented  in  isolation.  When  the  target  and  distractors  are 
modulated  at  the  same  rate,  significant  binaural  interference  is  observed 
regardless  of  the  frequency  separation  between  targets  and  distractors;  the 
presence  of  distractors  elevates  thresholds  by  a  factor  of  2-3.  When  targets 
and  distractors  are  within  1000  Hz  of  one  another  and  the  they  are  modulated 
at  different  rates,  sensitivity  to  envelope  delays  can  be  especially  poor, 
with  some  subjects  requiring  5-8  times  larger  delays  when  the  distractors  are 
present  than  when  they  are  absent. 


Detection  and  Recognition  of  Amplitude  Modulation  with  Tonal  Carriers. 

Stanley  Sheft  and  William  Yost 

The  ability  of  listeners  to  process  multiple  sources  of  sinusoidal 
amplitude  modulation  (AM)  was  evaluated  using  both  detection  and  recognition 
procedures.  For  all  conditions,  the  stimulus  was  a  two-tone  (909  and  4186  Hz) 
complex.  Test-AM  frequencies  were  4  and  17  Hz.  In  the  detection  paradigm, 
d-primes  obtained  with  simultaneous  modulation  of  both  tones  were  compared  to 
the  d-primes  obtained  with  modulation  of  just  one  of  the  tonal  components. 

When  both  tones  were  modulated  at  the  same  frequency,  a  phase  disparity 
between  the  envelopes  reduced  AM  detectability.  With  the  envelopes  in  phase, 
results  showed  a  linear  summation  of  the  d-primes  for  the  individual 
components.  With  modulation  of  the  two  tones  at  different  AM  frequencies, 
performance  approximated  optimal  processing  of  two  sources  of  uncorrelated 
information.  A  similar  result  was  obtained  in  the  recognition  paradigm  when 
the  task  was  to  discriminate  between  the  two  modulation  frequencies.  When  the 
AM  frequency  was  fixed  and  the  task  was  to  identify  which  carrier  was 
modulated,  performance  was  near  chance.  Results  are  consistent  with 
processing  of  near-threshold  AM  through  modulation-specific  channels  that  are 
broadly  tuned  to  carrier  frequency. 


Spectral  Fbsion  Based  on  Coherence  of  Amplitude  Modulation. 

Stanley  Sheft  and  William  Yost 

The  ability  of  listeners  to  attend  to  a  subset  of  components  (the  target) 
of  a  tonal  complex  was  investigated  using  a  forced-choice  procedure.  The 
stimulus  was  an  eight-component  tonal  complex.  The  components  of  the  target 
subset  (n  -  1,  2,  or  3  components)  were  distinguished  from  the  complex  by 
coherent  amplitude  modulation  (AM) .  Each  trial  of  the  cued  2IFC  task  was 
preceded  by  a  presentation  of  the  coherently  modulated  target  components  in 
isolation.  For  each  interval  of  a  trial,  either  the  target  or  an  equal  number 
of  non-target  components  shared  the  coherent  AM.  Subjects  were  required  to 
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detect  the  interval  in  which  the  target  components  were  coherently  modulated. 
The  remaining  components  of  the  complex  were  all  either  modulated  at  different 
rates,  modulated  with  a  random  shift  of  the  target-modulator  phase  angle,  or 
not  modulated.  Performance  was  measured  as  a  function  of  the  number  of  target 
components ,  the  harmonic  relationship  among  the  target  components ,  and  the 
harmonic  relationship  between  the  target  and  non-target  spectral  groups.  With 
coherent  AM  of  harmonic  target  components,  the  increment  in  d'  with  increasing 
n  exceeded  predictions  based  on  the  combination  of  independent  sources  of 
information,  suggesting  that  these  stimuli  may  be  processed  as  an  entity  by 
the  auditory  system. 


Temporal  Integration  in  Amplitude  Modulation  Detection. 

Stanley  Sheft  and  William  Yost 

Thresholds  for  detecting  sinusoidal  amplitude  modulation  (AM)  of  a 
wideband  noise  carrier  were  measured  as  a  function  of  the  duration  of  the 
modulating  signal.  The  carrier  was  either  (a)  gated  with  a  duration  that 
exceeded  the  duration  of  modulation  by  the  combined  stimulus  rise  and  fall 
times;  (b)  presented  with  a  fixed  duration  that  included  a  500-ms  carrier 
fringe  preceding  the  onset  of  modulation;  or  (c)  on  continuously.  In 
condition  (a),  the  gated-carrier  temporal  modulation  transfer  functions 
(TMTFs)  exhibited  a  bandpass  characteristic.  For  AM  frequencies  above  the 
individual  subject's  TMTF  highpass  segment,  the  mean  slope  of  the  integration 
functions  was  -7.46  dB  per  log  unit  duration.  For  the  fringe  and  continuous- 
carrier  conditions  (b  and  c) ,  the  mean  slopes  of  the  integration  functions 
were  respectively  -9.30  and  -9.36  dB  per  log  unit  duration.  Simulations  based 
on  integration  of  the  output  of  an  envelope  detector  approximate  the  results 
from  the  gated-carrier  conditions.  The  more  rapid  rates  of  integration 
obtained  in  the  fringe  and  continuous-carrier  conditions  may  be  due  to  "over- 
integration"  where  at  brief  modulation  durations  portions  of  the  unmodulated 
carrier  envelope  are  included  in  the  integration  of  modulating  signal  energy. 


Cued  Envelope-Correlation  Detection. 

Stanley  Sheft  and  William  Yost 

Involvement  of  envelope  coherence  in  source  segregation  requires  that 
listeners  can  both  detect  the  coherence  and  then  in  some  manner  selectively 
process  the  various  fluctuation  patterns  that  characterize  the  different 
sources.  Envelope  coherence  or  synchrony  detection  was  therefore  evaluated  in 
masking  and  discrimination  conditions  requiring  selective  cross-spectral 
processing  of  similar  patterns  of  envelope  fluctuation. 

If  a  spectral  subset  of  a  complex  sound  first  precedes  the  sound,  the 
subset  will  tend  to  be  heard  as  a  separate  auditory  image  when  repeated  as 
part  of  the  complex.  A  cued  2IFC  test  procedure  was  used  to  encourage  this 
type  of  spectral  segregation;  each  trial  was  preceded  by  presentation  of  the 
synchronous  target  noise  bands  as  a  cue  complex.  Center  frequencies  (CFs)  of 
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the  two  to  three  target  bands  were  either  500,  1250,  or  3125  Hz,  with 
bandwidth  ranging  from  12.5-200  Hz.  Stimulus  duration  was  400  ms. 

In  contrast  to  previous  studies  of  synchrony  detection  that  used  a 
conventional  (uncued)  procedure,  results  from  the  cued  procedure  indicated 
relatively  small  effects  of  noise  bandwidth  and  of  the  frequency  separation 
between  noise  bands.  In  the  discrimination  conditions,  envelope  coherence  in 
the  nonsignal  observation  interval  also  had  little  effect  on  performance.  In 
the  masking  conditions,  the  task  was  to  detect  coherence  among  target  noise 
bands  in  the  presence  of  masker  noise  bands.  Target  and  masker  bands  shared  a 
common  bandwidth.  Though  performance  was  impaired  by  the  maskers,  there  was 
less  masking  than  generally  observed  when  modulated  maskers  are  used  in 
experiments  evaluating  sinusoidal  modulation  detection,  and  depth  and  rate 
discrimination.  When  more  than  one  masking  band  was  present,  there  was  little 
effect  of  coherence  among  the  masking  bands  of  the  nonsignal  interval  if  these 
bands  were  ^.synchronous  with  the  target  bands  of  the  cue  complex.  With  masker 
coherence,  the  task  then  requires  synchrony  detection  restricted  to  the 
spectral  region  of  the  target  bands. 

Across  conditions,  spectral  location  of  the  target  bands  tended  to  have 
little  effect  on  synchrony-detection  performance,  especially  at  narrowest 
bandwidths.  This  is  consistent  with  the  type  of  wideband  processing  needed 
for  sound-source  segregation.  Results  from  the  masking  and  discrimination 
conditions  indicate  that  along  with  detecting  cross-spectral  synchrony,  the 
auditory  system  can  with  fairly  good  precision  selectively  process  similar 
concurrent  patterns  of  envelope  fluctuation. 


Detection  of  Intensity  Decrements  Followed  by  Increments. 

Stanley  Sheft  and  William  Yost 

Data  were  collected  from  a  modified  decrement  detection  procedure  in 
order  to  compare  subject  performance  to  the  predictions  of  two  decision 
statistics  derived  from  the  output  of  an  envelope  detector.  At  roughly  the 
midpoint  of  each  stimulus  presentation,  there  was  an  intensity  increment  of 
the  gated  wideband  noise  carrier.  The  task  was  to  detect  an  intensity 
decrement  just  preceding  the  increment  in  the  signal  interval  of  the  2IFC 
task.  Decrement  duration  ranged  from  2.5  to  40  ms.  Increments  of  3,  6,  and  9 
dB  were  used  with  the  onset  of  the  increment  randomly  occurring  150-300  ms 
from  the  stimulus  onset.  Decrement  detection  was  also  measured  for  conditions 
in  which  there  was  no  increment.  Best  performance  was  obtained  with  the  3-dB 
increment.  With  the  6-  or  9-dB  increment,  thresholds  were  significantly 
higher  and  showed  less  change  as  a  function  of  decrement  duration. 

Simulations  based  on  the  output  of  an  envelope  detector  used  either  the 
variance  or  the  ratio  of  the  largest-to-smallest  value  as  the  decision 
statistic.  For  both  statistics,  simulations  approximated  subject  performance 
in  the  conditions  with  no  increment.  Simulations,  however,  did  not  show  the 
drop  in  performance  with  the  6-  or  9-dB  increment,  suggesting  involvement  of 
multiplicative  internal  noise  in  envelope  detection. 
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The  Effect  of  Intensity  Increments  on  Decrement  Detection. 

Stanley  Sheft  and  William  Yost 

A  previous  study  [S.  Sheft  and  W.A.  Yost,  J.  Acoust.  Soc.  Am.  89,  1913 
(1991)]  indicated  that  decrement  detection  thresholds  are  affected  by  sudden 
level  changes  unrelated  to  the  decrement.  The  present  study  measured  the  time 
course  of  the  effect  of  an  intensity  increment  (pedestal)  on  decrement 
detection  with  wideband  noise.  In  one  set  of  conditions,  the  increment 
followed  the  decrement  with  the  temporal  separation  between  the  decrement 
offset  and  the  increment  onset  varying  between  0-80  ms.  In  another  set,  the 
decrement  occurred  within  the  pedestal  with  the  decrement  onset  0-160  ms  from 
the  pedestal  onset.  Increment  size  ranged  from  3-12  dB.  Compared  to 
thresholds  obtained  without  the  increment,  the  effect  of  the  increment 
depended  on  the  temporal  separation  between  both  the  stimulus  and  pedestal 
onsets  and  the  decrement  onset.  With  brief  separations,  performance  improved 
across  all  increment  levels.  Consistent  with  interference  due  to  the  neural 
onset  response,  the  introduction  of  a  silent  gap  before  the  pedestal 
eliminated  the  beneficial  effect  of  the  increment.  However,  the  effect  of  the 
increment  extended  out  to  160  ms,  past  possible  involvement  of  the  neural 
onset  response  and  short-term  adaptation. 


Modulation  Detection  Interference  with  Complex  Modulators. 

Stanley  Sheft  and  William  Yost 

Amplitude  modulation  (AM)  masking  has  been  a  topic  of  concern  in  recent 
studies  of  complex  sound  processing.  Previous  work  with  sinusoidal  modulators 
has  shown  that  even  with  the  masker  and  probe  carriers  widely  separated  in 
frequency,  masker  modulation  can  significantly  interfere  with  detection  of 
probe  modulation.  To  evaluate  the  effect  of  complex  stimulus  modulation  on 
cross-spectral  processing  of  AM,  a  two-tone  complex  was  used  as  the  masker 
modulator  in  the  present  study.  Experimental  conditions  involved  either 
detection  of  probe  modulation  or  discrimination  of  the  pattern  of  probe 
modulation. 

Similar  to  results  obtained  with  sinusoidal  masker  modulators,  there  was 
a  detrimental  effect  of  masker  modulation  on  the  processing  of  probe 
modulation.  Unlike  results  from  CMR  studies,  adding  modulated  masker 
components  never  led  to  a  reduction  in  the  amount  of  masking.  Interference 
was  also  obtained  when  the  probe  was  modulated  at  the  beat  rate  of  the  two- 
tone  masker  modulator.  The  amount  of  interference  due  to  beating  of  the 
masker  modulator  diminished  with  increasing  either  the  probe  AM  rate  or  the 
masker-modulator  beat  rate  from  4  to  10  Hz.  These  results  indicate  a  masking 
effect  not  predicted  by  a  Fourier  representation  of  the  stimulus  envelope. 
Results  will  be  discussed  in  terms  of  the  possible  involvement  of  the  cross- 
spectral  processing  of  AM  in  sound-source  determination. 
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Spectral  Transposition  of  Envelope  Modulation. 

Stanley  Sheft  and  William  Yost 

Previous  work  [Sheft  and  Yost,  J.  Acoust.  Soc.  Am.  85  Suppl.  1,  S121 
(1989)]  has  shown  that  it  is  often  difficult  to  identify  which  component  of  a 
multicomponent  complex  is  amplitude  modulated.  The  role  of  the  carrier  in 
envelope  processing  was  examined  in  the  present  study  with  a  cued  2IFC 
envelope-discrimination  procedure.  The  narrowband  noises  of  the  two 
observation  intervals  differed  only  in  terms  of  their  pattern  of  envelope 
fluctuation,  with  the  signal  interval  a  repetition  of  the  cue  envelope  pattern 
at  a  center  frequency  (CF)  different  from  that  of  the  cue.  Cross-spectral 
transposition  of  envelope  information  was  evaluated  by  varying  the  number  of 
common  CFs  between  the  noise  bands  of  the  cue  complex  and  the  target  bands  of 
the  observation  intervals.  With  both  the  cue  and  observation  intervals 
consisting  of  a  single  noise  band,  the  ability  to  transpose  envelope 
information  from  the  cue  to  the  observation- interval  CF  diminished  with 
increasing  noise  bandwidth  from  12.5  to  200  Hz.  Results  from  the  multi-band 
conditions  indicate  that  listeners  are  unable  to  integrate  the  envelope 
information  across  audio-frequency  regions  to  improve  performance.  In  fact, 
the  added  noise  bands  led  to  a  significant  drop  in  performance  in  many 
conditions.  These  results  suggest  that  envelope  information  is  not  processed 
independent  of  the  spectral  location  of  the  modulated  carrier. 


Temporal  Representation  of  Rippled  Noise  in  the  Anteroventral  Cochlear  Nucleus 
of  the  Chinchilla. 

William  Shofner 

These  neurophysiological  experiments  have  been  directed  at  gaining  an 
understanding  of  how  auditory  neurons  encode  stimulus  information  found  in  the 
time  domain  of  complex  sounds,  particularly  those  complex  sounds  which  can 
generate  the  perception  of  pitch.  Information  in  the  time  domain  can  be  found 
in  the  waveform  fine  structure  and  envelope.  Rippled  noise  is  a  broadband 
stimulus  which  produces  the  perception  of  pitch,  yet  is  aperiodic  in  the  time 
domain.  Cos+  rippled  noise  is  generated  when  a  broadband  noise  is  delayed  and 
then  added  to  the  undelayed  noise.  The  resulting  stimulus  has  a  power 
spectrum  that  varies  in  a  cosinusoidal  fashion  in  which  the  peaks  are 
separated  by  1/r ,  where  r  is  the  delay.  The  autocorrelation  function  of 
waveform  fine  structure  of  cos+  noise  has  a  single  peak  at  the  delay  of  the 
noise.  Thus,  unlike  wideband  noise  which  is  aperiodic  and  has  a  flat 
autocorrelation  function,  rippled  noise  is  an  aperiodic  stimulus  that  does  not 
have  a  flat  autocorrelation  function.  Moreover,  the  autocorrelation  function 
of  the  envelope  of  cos+  noise  also  has  a  single  peak  at  the  delay.  Cos-  noise 
is  generated  when  the  delayed  version  of  the  noise  is  subtracted  from  the 
undelayed  noise;  the  autocorrelation  function  of  the  waveform  fine  structure 
of  cos-  noise  shows  a  null  at  the  delay,  while  the  autocorrelation  function  of 
the  envelope  of  cos-  noise  shows  a  single  peak  at  the  delay.  Comparison  of 
responses  to  cos+  and  cos-  noise  will  provide  data  as  to  whether  neurons 
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extract  the  delay  from  the  waveform  fine  structure  or  from  the  envelope. 

Neurophysiological  experiments  investigated  the  temporal  responses  of 
cochlear  nucleus  neurons  in  the  chinchilla  to  rippled  noises.  In  some 
instances,  reponses  to  tones  complexes  were  also  obtained.  These  temporal 
response  properties  are  those  based  on  the  time  intervals  between  individual 
spikes  and  were  evaluated  by  constructing  renewal  densities.  Renewal 
densities  have  been  referred  to  as  the  autocorrelation  function  of  the  spike 
train  and  are  constructed  by  summing  the  distributions  of  first-order  and  all 
higher-order  interspike  intervals.  The  renewal  density  shows  the  probability 
of  discharge  following  an  action  potential;  that  is,  the  renewal  density  shows 
the  average  firing  pattern  following  a  spike. 

In  general,  all  physiological  neuronal  types  recorded  can  show 
periodicities  in  their  discharge  in  response  to  tone  complexes  that  are 
related  to  the  fundamental  frequency  of  the  complex.  However,  only  those 
neurons  that  show  phase-locking  at  best  frequency  have  renewal  densities  that 
show  a  major  peak  at  the  delay  in  response  to  cos+  rippled  noise;  these 
neurons  show  a  null  at  the  delay  in  renewal  densities  in  response  to  cos- 
rippled  noise.  Thus,  neurons  which  show  phase-locking  to  best  frequency  tones 
extract  the  delay  of  rippled  noise  from  the  waveform  fine  structure  of  the 
stimulus.  Most  cochlear  nucleus  units  which  did  not  show  phase-locking  to 
best  frequency  tones  gave  renewal  densities  that  did  not  contain  features 
related  to  the  delay  of  rippled  noise.  A  few  of  these  non-phase-locked  units 
did  show  peaks  in  renewal  densities  at  the  delay  for  both  cos+  and  cos- 
rippled  noises,  suggesting  that  these  units  extract  the  delay  from  the 
stimulus  envelope. 

The  strength  of  synchrony  in  response  to  rippled  noise  was  quantified 
from  the  renewal  density  as  the  root-mean-squared  deviation  in  firing  rate 
around  the  delay  normalized  to  the  average  firing  rate.  This  analysis 
confirmed  that  the  units  that  show  the  strongest  synchrony  at  the  rippled 
noise  delay  are  low-best  frequency,  phase-locked  units. 

Synchrony  at  the  rippled  noise  delay  was  also  demonstrated  using  evoked 
potential  recording.  Autocorrelation  functions  of  the  neurophonic  potenital 
showed  peaks  at  the  delay  for  both  cos+  and  cos-  rippled  noises .  This 
observation  suggests  that  the  neurophonic  potiential  reflects  temporal 
properties  of  the  stimulus  envelope,  primarily  because  of  the  low-pass 
filtering  properties  of  the  recording  electrode.  The  finding  that  the 
neurophonic  reflects  the  stimulus  envelope  is  consistent  with  the  frequency 
following  response  recorded  with  scalp  electrodes  in  human  subjects.  In 
addition,  peaks  could  be  observed  in  autocorrelation  functions  of  neurophonic 
potentials  with  delays  as  short  as  1  ms;  peaks  were  never  observed  in  renewal 
densities  of  single  units  for  rippled  noise  delays  as  short  as  1  ms. 

The  results  demonstrate  that  a  temporal  representation  of  the  delay  of 
rippled  noise  does  exist  at  the  level  of  the  cochlear  nucleus;  this  temporal 
representation  found  at  the  single  unit  level  can  account  for  some,  but  not 
all  of  the  pitches  of  rippled  noise.  To  account  for  all  pitches,  it  may  be 
necessary  to  combine  the  outputs  of  several  frequency  channels. 
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Increment  Detection  of  Wideband  and  Bandliaited  Noises  by  Chinchillas. 

William  Shofner,  William  Yost  and  Stanley  Sheft 

Many  studies  have  addressed  the  effect  of  noise  bandwidth  on  intensity 
discrimination  in  humans,  and  these  studies  all  generally  agree  that  intensity 
discrimination  thresholds  decrease  as  the  bandwidth  of  the  noise  increases. 

As  the  bandwidth  narrows,  there  is  an  increase  in  the  temporal  flucuations  of 
the  instantaneous  power  in  the  noise  waveform,  and  these  fluctuations 
presumably  interfere  with  a  listener's  ability  to  detect  an  ir^rement  in 
intensity.  Green  (1960,  J.  Acoust.  Soc.  Am.,  32,  121-131)  derived  an 
analytical  model  for  intensity  discrimination  of  noise  for  an  ideal  energy 
detector  which  measures  the  power  in  two  noise  samples  and  selects  the 
waveform  with  the  largest  power.  The  ideal  energy  detector  model  is  described 
as 

d'=(WT)^| - i -  (!) 

[i-H<l>2)T 


where  d'  is  the  detectability,  W  is  bandwidth,  T  is  signal  duration  (or 
integration  time),  S  is  the  signal  power  and  N  is  the  noise  power.  Note  that 
S/N  is  equivalent  to  the  Weber  fraction  AI/I .  This  model  takes  into  account 
the  increase  in  temporal  flucuations  as  the  noise  bandwidth  narrows.  The 
equation  of  the  energy  detector  model  can  be  rearranged  to  give 

101og(-| - 1 - r)  =101og-^--5logtf  <2> 

u +<l)+i(l)a]’  T~l 


Equation  2  is  a  linear  equation  having  a  slope  of  -5  dB/decade  Increase  in 
bandwidth;  that  is,  the  ideal  energy  detector  model  predicts  that  a  10-fold 
increase  in  the  noise  bandwidth  will  result  in  a  decrease  in  threshold  of  -5 
dB.  The  reported  degree  of  the  bandwidth  effect  does  vary  among  studies  in 
human  subjects  and  is  typically  less  than  the  predicted  -5  dB/decade  slope  of 
the  ideal  energy  detector  model. 

The  present  study  examined  the  intensity  discrimination  capabilities  of 
the  chinchilla  for  noise  signals.  Six  (5  male  and  1  female)  binaural,  adult 
chinchillas  ( Chinchilla  lanigera)  served  as  subjects.  Chinchillas  were 
trained  to  hold  down  a  response  lever  with  a  reward  chute  and  release  the 
lever  in  the  presence  of  a  1  s  increment  in  noise  level.  Incements  were 
generated  by  adding  the  noise  coherently  during  the  1  s  signal  interval. 
Animals  initiated  a  trial  by  pressing  down  on  the  response  lever,  and  the  1  s 
signal  interval  varied  randomly  from  1-8  s  after  the  animal  initiated  a  trial. 
Thresholds  were  obtained  using  a  two-down,  one-up  tracking  rule,  and  the 
animals  received  food  pellet  rewards  for  correct  responses.  Thresholds  were 
obtained  as  signal  re:  standard  ratios  in  dB  and  were  converted  to  DL  or  AI/I . 

The  Difference  Limens  (DLs)  as  a  function  of  the  overall  level  of  the 
continuous  masker  for  wideband  noise  were  obtained.  The  range  of  levels 
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tested  was  limited  to  42-82  dB  SPL.  The  lowest  level  at  42  dB  SPL  approaches 
the  noise  floor  of  the  sound  attenuating  chamber.  Levels  above  82  dB  SPL  were 
not  presented  in  order  to  avoid  causing  any  cochlear  damage  to  the  animals. 

For  the  average  chinchilla,  there  is  a  decrease  in  threshold  from  4.520  dB, 
which  then  appears  to  be  relatively  constant  between  52-82  dB  SPL.  The 
average  DL  over  the  range  of  52-82  dB  SPL  is  1.334  dB.  An  analysis  of 
variance  (ANOVA)  was  carried  out  on  the  signal  re:  standard  ratios  for  the 
average  chinchilla  for  the  levels  between  52-82  dB  SPL.  The  ANOVA  confirmed 
that  the  average  increment  detection  thresholds  are  equal  at  base  levels  of 
52,  62,  72  and  82  dB  SPL  (F-1.887;  F  <  Fo.o5<i),3,2(>;  P  >  0-05).  Thus,  Weber's 
Law  appears  to  hold  for  the  average  chinchilla  over  the  range  of  levels  from 
52-82  dB  SPL  for  the  wideband  noise. 

Increment  detection  thresholds  were  obtained  as  a  function  of  bandwidth 
for  a  continuous  masker  noise  of  72  dB  SPL.  Bandwidths  used  were  wider  than 
the  bandwidths  of  individual  auditory  filters;  thus,  for  a  given  bandwidth, 
the  output  across  several  frequency  channels  must  be  combined.  The  linear 
regression  through  the  data  has  a  slope  of  -3.6  dB/decade  increase  in 
bandwidth  for  the  average  chinchilla.  The  Y- intercept  of  this  regression  line 
is  5.9  and  is  equal  to  101og(d'/T1/2)  from  Equation  2.  The  empirical 
thresholds  for  the  average  chinchilla  fall  above  those  predicted  by  the  ideal 
energy  detector  model.  The  bandwidth  slopes  obtained  from  individual  animals 
ranged  between  -2.6  to  -4.4. 

These  results  demonstrate  that  for  conditions  of  a  continuous  masker  and 
where  the  masker  and  signal  have  the  same  bandwidth,  there  Is  a  decrease  in 
increment  detection  threshold  as  bandwidth  increases.  While  the  slopes  of  the 
bandwidth  function  obtained  for  the  chinchilla  are  less  than  the  predicted 
slope  of  the  ideal  energy  detector  model,  the  bandwidth  slopes  obtained  for 
the  chinchilla  are  similar  to  those  generally  reported  for  humans  which  are 
also  shallower  than  -5  (see  P.N.  Schacknow  and  D.H.  Raab,  1976,  J.  Acoust. 

Soc.  Am. ,  60,  893-905) . 


What  Hearing  Really  Is:  Auditory  Image  Perception  and  Analysis. 

William  A.  Yost 

The  evolution  of  hearing  has  culminated  in  the  human's  remarkable  ability 
to  determine  the  sources  of  sounds.  Sound  source  identification  appears  to  be 
the  motivation  for  the  evolution  of  an  auditory  system.  A  century  of 
psychoacoustical  research  has  revealed  a  wealth  of  information  about 
processing  the  basic  properties  of  sound.  However,  far  less  attention  has 
been  paid  to  how  the  auditory  system  uses  this  information  to  determine  sound 
sources.  Psychoacoustical  data,  models,  and  theories  suggest  that  the  human's 
ability  to  spectrally  resolved  components  of  a  complex  sound  and  their  ability 
to  localize  sounds  are  responsible  for  sound  source  identification.  Sound 
sources  provide  a  number  of  other  characteristics  that  could  be  used  by  an 
auditory  system  to  aid  it  in  sound  source  identification.  A  highly  evolved 
auditory  system  should  be  able  to  use  these  characteristics  to  help  form 
auditory  images  of  sound  sources.  One  such  characteristic  or  variable  is  the 
slow  temporal  modulation  of  sound  generated  by  almost  all  sound  sources. 
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Consider  for  instance,  the  slow  frequency  vibrato  and  amplitude  jitter  present 
in  all  voiced  utterances.  We  argue  that  the  human  auditory  system  uses  such 
slow  temporal  modulations  to  identify  sound  sources.  Processing  these 
modulations  has  certain  consequences  for  auditory  perception  of  complex 
sounds.  As  a  consequence  of  processing  temporal  modulation,  the  auditory 
system  appears  to  group  together  tonal  components  that  share  a  common  pattern 
of  temporal  modulation  even  when  the  tonal  components  are  widely  spaced  in 
frequency.  These  effects  help  establish  the  relevance  of  slow  temporal 
modulation  for  sound  source  identification. 


Temporal  Modulation  Transfer  Functions  for  Pure  Tones . 

William  A.  Yost  and  Stanley  Sheft 

Temporal  Modulation  Transfer  Functions  (TMTFs)  were  obtained  for 
sinusoidally  amplitude  modulated  (SAM)  pure  tones.  TMTFs  were  obtained  by 
determining  the  depth  of  SAM  required  for  modulation  detection  in  a  two- 
alternative,  forced- choice  adaptive  procedure.  TMTFs  were  obtained  for  500-Hz, 
1000-Hz,  and  4000-Hz  carrier  frequencies;  for  durations  of  125  ms  and  500  ms 
(all  stimuli  were  shaped  with  a  20-ms  raised  cosine) ;  in  gated  and  continuous 
background  conditions;  and  for  modulation  rates  ranging  from  2  to  128  Hz.  In 
the  gated  condition,  the  carrier  tone  was  gated  on  and  off  and  the  modulation 
occurred  over  the  full  duration  of  the  tone,  while  in  the  continuous 
condition,  the  carrier  tone  was  on  continuously  and  it  was  modulated  only 
during  the  observation  interval.  Thresholds  for  modulation  detection  were 
lower  in  the  continuous  than  in  the  gated  condition  and  the  thresholds  were 
lower  for  the  500-ms  than  for  the  125-ms  stimuli.  The  TMTFs  displayed  a 
bandpass  characteristic  in  all  conditions,  but  the  highpass  segment  was 
steeper  for  the  gated  than  for  the  continuous  conditions.  The  loss  in 
sensitivity  at  low  modulation  frequencies  meant  that  the  lowest  thresholds 
were  obtained  for  modulation  frequencies  of  4-Hz  to  8-Hz  in  the  continuous 
condition  and  above  16  Hz  in  the  gated  condition. 


Auditory  Image  Perception  and  Amplitude  Modulation:  Frequency  and  Intensity 
Discrimination  of  Individual  Components  for  Amplitude -Modulated  Two-Tone 
Complexes . 

William  A.  Yost 

Auditory  image  perception  describes  the  auditory  processing  of  sound 
sources,  especially  In  complex,  multi-source  acoustic  environments.  A  number 
of  investigators  have  shown  that  slow  temporal  modulation  imparted  to  a  subset 
of  target  components  in  a  multi-tone  complex  sound  will  cause  the  target 
components  to  fuse  into  an  auditory  image.  As  such,  coherent  slow  temporal 
modulation  may  be  one  of  the  cues  used  by  the  auditory  system  to  form  auditory 
images,  and  these  images  would  in  turn  allow  for  the  identification  of  sound 
sources.  We  also  discovered  an  apparent  consequence  of  the  auditory  system's 
use  of  coherent  slow  temporal  modulation  to  form  auditory  images.  When 
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sinusoidal  amplitude  modulation  is  used  to  modulate  a  two- tone  complex 
consisting  of  two  tonal  carriers  of  very  different  frequencies,  listeners  have 
great  difficulty  in  processing  the  temporal  modulation  pattern  of  either  of 
the  two  constituent  carrier- tones ,  even  when  they  are  separated  by  as  many  as 
five  octaves.  That  is,  despite  the  fact  that  the  carriers  are  separated  by 
many  critical  bands,  a  listener's  ability  to  (1)  detect  the  presence  of 
modulation,  (2)  discriminate  a  change  in  modulation  rate,  and  (3)  discriminate 
a  change  in  modulation  depth  of  either  tone  is  severely  impaired  when  both 
tones  are  modulated  at  the  same  rate.  We  have  called  this  form  of  interference 
Modulation  Detection  Interference  (MDI)  in  that  the  processing  of  modulation 
in  one  frequency  channel  is  interfered  with  when  the  same  pattern  of 
modulation  is  present  in  another  frequency  channel.  The  investigators  surmised 
that  MDI  results  from  the  auditory  system  using  the  coherent  amplitude 
modulation  in  the  two  channels  to  fuse  the  two  carriers  into  a  single  auditory 
image.  Because  temporal  modulation  was  the  cue  used  to  fuse  the  tones  into  one 
auditory  image,  the  auditory  system  has  difficulty  processing  the  temporal 
modulation  of  either  of  the  two  constituent  tones  of  that  image. 

The  present  study  addresses  the  ability  of  the  auditory  system  to  process 
other  dimensions  of  the  tonal  complexes  when  the  two  tones  are  modulated  and 
MDI  occurs.  In  particular,  when  a  two- tone  complex  is  amplitude  modulated  and 
listeners  have  difficulty  processing  the  modulation  pattern  of  either  tone 
(i.e.,  MDI  occurs),  are  the  thresholds  for  discriminating:  1)  a  change  in  the 
frequency  of  one  of  the  tonal  carriers,  2)  a  change  in  the  overall  amplitude 
of  one  of  the  amplitude  modulated  complexes  increased  because  of  the  common 
pattern  of  modulation?  Because  temporal  modulation,  but  neither  frequency  nor 
overall  amplitude,  were  used  by  the  auditory  system  to  form  an  auditory  image, 
it  is  predicted  that  neither  frequency  nor  intensity  discrimination  would  be 
affected  when  the  two  carrier  tones  are  modulated  and  MDI  occurs. 


Modulation  Detection  Interference  for  Discriminating  Modulation  Depth  of 
Sinusoidally  Amplitude  Modulated  Tones. 

William  A.  Yost 

The  ability  of  listeners  to  discriminate  a  change  in  the  depth  of 
sinusoidal  amplitude  modulation  (SAM)  of  a  4000-Hz  tone  was  measured  using  the 
MDI  (Modulation  Detection  Interference)  procedure  (see  Yost,  Sheft,  and  Opie, 
JASA,  Vol.  86,  1989,  p.  2138-2147).  Modulation  depth  discrimination  was 
measured  for  base  depths  of  0,  25,  50,  and  100%  depth  of  modulation  and  for 
rates  of  SAM  ranging  from  2  to  128  Hz.  To  a  first  approximation,  thresholds 
for  discriminating  a  change  in  SAM  depth  were  3-5%  of  the  base  depth,  except 
at  low-modulation  rates  (below  8  Hz)  where  thresholds  were  slightly  higher. 
When  a  1000-Hz  tone  modulated  at  the  same  rate  as  the  4000-Hz  tone  was 
presented  simultaneously  with  the  4000-Hz  tone,  thresholds  for  modulation 
depth  discrimination  increased  to  10-15%  of  the  base  depth.  No  such  increases 
in  thresholds  were  observed  when  the  1000-Hz  tone  was  not  modulated.  Thus,  as 
for  tasks  involving  detection  of  SAM  or  discriminating  a  change  in  SAM  rate 
(see  Yost,  Sheft,  and  Opie,  JASA,  Vol.  86,  1989,  p.  2138-2147),  the  presence 
of  another  modulated  tone  severely  disrupts  the  temporal  processing  of  the 
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target  sound,  even  when  the  two  sounds  are  many  critical  bands  apart.  These 
results  appear  consistent  with  the  assumption  that  slow  temporal  modulation 
may  be  used  to  form  auditory  images. 


Recognition  Memory  for  Arbitrary,  Complex  Waveforms. 

William  A.  Yost 

Recognition  memory  for  arbitrary  complex  stimuli  was  measured  using  the 
operating  characteristic  technique  developed  by  Egan  [J.P.  Egan,  AFCRC  TN  58- 
51,  AD  152650  (1958)].  Ten  complex  stimuli  (500  ms  in  duration),  each 
consisting  of  six,  randomly  chosen,  equal  amplitude  sinusoidal  components 
spanning  frequencies  from  300  to  3000  Hz  were  used  as  a  training  set.  After  15 
minutes  of  listening  to  the  training  set,  listeners  were  presented  a  20- 
stimulus  test  set  consisting  of  the  ten  training  stimuli  along  with  ten 
additional  complex  stimuli  (generated  in  the  same  way  as  the  ten  training 
stimuli) .  In  a  recognition  memory  paradigm,  listeners  used  a  five-point  rating 
scale  to  rate  their  confidence  that  a  stimulus  presented  from  the  test  set  was 
from  the  training  set.  Different  constraints  were  placed  on  the  selection  of 
the  complex  waveforms  to  determine  how  those  constraints  affected  recognition 
memory.  For  one  condition  there  were  no  constraints  except  as  described  above; 
for  a  second  condition  the  six-tonal  components  were  harmonics  (randomly 
chosen)  of  a  common  fundamental;  and  for  a  third  condition  the  six- tone 
complex  was  sinusoidally  amplitude  modulated  at  rates  of  4,  16,  or  32  Hz.  The 
results  showed  that  the  method  can  be  used  to  study  processing  of  complex 
sounds  and  that  amplitude  modulation  enhances  the  ability  of  listeners  to 
remember  arbritary  complex  sounds. 


Processing  Temporal  Modulation  of  Narrow-Band  Noises. 

William  A.  Yost  and  Stanley  Sheft 

The  ability  of  listeners  to  detect  sinusoidal  amplitude  modulation  of 
probe  signals  was  measured  in  a  number  of  conditions.  The  probes  were  500  ms 
in  duration  and  were  either  a  4000 -Hz  tone  or  narrow  bands  of  noise  centered 
at  4000  Hz  with  bandwidths  ranging  from  16  Hz  to  1024  Hz.  Thresholds  for  probe 
modulation  were  determined  when  a  masker,  consisting  of  narrow-band  noises 
centered  at  1000  Hz  with  bandwidths  ranging  from  32  Hz  to  1024  Hz,  was 
simultaneously  presented  with  the  probe.  Thresholds  for  probe  modulation  were 
detection  also  obtained  when  there  was  no  masker.  The  rates  of  sinusoidal 
amplitude  modulation  of  the  probes  (and  in  some  cases  of  the  maskers)  ranged 
from  2  Hz  to  128  Hz.  Thresholds  were  obtained  from  five  listeners  in  a  two- 
alternative,  forced-choice  adaptive  psychophysical  task.  When  the  bandwidth  of 
the  noise  was  less  than  approximately  512  Hz,  modulation  detection  thresholds 
were  higher  than  those  obtained  for  tonal  signals  and  noises  whose  bandwidths 
were  greater  than  512  Hz.  The  results  from  the  various  conditions  are 
consistent  with  the  assumption  that  slow  temporal  modulations  inherent  in 
narrow-band  noises  interfere  with  the  ability  of  listeners  to  detect  low  rates 


14 


AFOSR-89-0335 


of  sinusoidal  amplitude  modulation.  Such  interference  exists  even  when  the 
masker  is  two  octaves  away  from  the  probe.  The  results  are  consistent  with  the 
recent  literature  involving  temporal  processing  of  signals  across  wide 
frequency  regions. 


16  &  32  Hz  Wide  Probes  -  No  Masker 


Modulation  Rate  (Hz) 

16-Hz  ride  32 -HZ  ride 

■  .  „  _ 


A  New  Psychophysical  Procedure  for  Measuring  Selective  Attention. 

William  A.  Yost,  Raymond  H.  Dye,  Mark  Stellmack  and  Stanley  Sheft 

Recently  a  number  of  new  psychophysical  procedures  have  been  proposed  to 
measure  and  account  for  subjects  performance  in  multi -dimensional  stimulus 
paradigms.  These  procedures  involve  stimulus  contexts  in  which  listeners  are 
presented  more  than  one  stimulus  dimension  on  any  experimental  trial  and  are 
asked  to  make  a  binary  response  concerning  the  stimulus  presentation.  These 
methods  are  an  extension  of  the  earlier  signal-detection  work  that  accounted 
for  subjects  performance  in  multiple  observation- interval  tasks.  These  newer 
procedures  are  being  developed  because  of  the  renewed  interest  in  auditory 
perception  of  complex  sounds.  This  work  describes  a  procedure  designed  for 
conditions  in  which  the  listener  receives  two  stimuli  per  observation 
interval,  one  the  target  and  the  other  the  non- target  (or  distractor 
stimulus).  Each  stimulus  varies  along  the  same  continuum  and  the  listener 
judges,  with  a  binary  response,  which  of  two  target  possibilities  occurred. 
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The  procedure  is  a  simplified  variation  of  the  General  Recognition  Theory  of 
Ashby  and  Townsend. 

A  number  of  simple  assumptions  underlie  this  procedure:  t  represents  the 
target  and  d  the  distractor  or  non-target,  t*  and  dj  are,  then,  the  various 
values  of  the  two  stimuli  along  a  continuum.  We  assume  that  i  -  j  -  -m<  i,j< 
m;  that  is,  the  stimulus  variation  along  the  continuum  is  the  same  for  each 
dimension  and  the  stimulus  variation  is  symmetrical  about  some  midpoint  (i.e. 
i,j  -0).  On  any  trial  tA+dj  is  presented  and  the  listener  is  to  determine 
whether  tA  on  that  trial  belongs  to  one  of  two  classes,  P  (positive  class)  or 
N  (negative  class).  For  instance,  t  and  d  could  be  two  different  frequencies 
(d  a  low-frequency  tone  and  t  a  high-frequency  tone)  and  the  continuum  could 
be  the  interaural  time  difference  for  each  tone.  Thus,  on  every  trial  both 
tones  are  simultaneously  presented  each  with  a  randomly  chosen  value  of  an 
interaural  time  difference.  Half  of  the  possible  time  delays  would  favor  the 
right  ear  and  half  would  favor  the  left  ear.  The  listener  decides  if  the  high- 
frequency  tone  (t)  is  left  (N)  or  right  (P)  of  midline.  Thus,  the  range  of 
possible  interaural  time  differences  are  the  same  for  both  tones  and  the 
interaural  time  differences  are  chosen  to  be  symmetric  about  0  (midline).  Note 
that  if  i  and  j  vary  from  -m  to  m,  there  are  (2m+l)2  trials  in  the  experiment, 
and  each  combination  of  tA  and  dj  is  presented  once  with  the  listener  deciding 
for  each  presentation  whether  t±  belonged  to  class  N  or  P.  Thus,  the  entries 
in  the  response  matrix  are  either  Ps  (in  the  example,  right  responses)  or  Ns 
(left  responses). 

This  procedure  would  be  used  when  one  is  interested  in  how  a  listener 
processes  one  stimulus  in  the  presence  of  another  stimulus,  especially  when 
the  values  of  each  stimulus  are  super- threshold.  That  is,  how  much  attention 
or  weight  is  given  to  the  d  dimension  when  listeners  are  asked  to  attend  to  or 
process  the  t  dimension?  A  number  of  vocabularies  have  been  used  to  describe 
such  processing.  One  can  describe  the  task  in  terms  of  selective  or  divided 
attention,  or  in  terms  of  whether  or  not  the  listener  is  analytic  or  synthetic 
in  his  or  her  ability  to  process  the  target  dimension,  or  in  how  well  the 
listener  can  "hear  out"  the  target  in  the  presence  of  the  distractor.  In  many 
complex  stimulus  task  one  stimulus  stands  out  (e.g.  forms  a  stream  or  an  image 
or  an  object  or  a  scene  from  the  background  of  other  stimuli,  and  the 
investigator  wants  to  determine  the  extent  to  which  this  stimulus  is 
perceptually  separable  from  the  other  stimulus  or  stimuli. 

Thus,  a  crucial  performance  measure  is  the  weight  (w,  with  0<w<l) 
assigned  to  the  distractor  or  d  dimension. In  order  to  determine  w,  we  make  the 
assumption  that  each  cell  in  the  decision  matrix  (C^)  is  given  a  value  by: 

Cij  -  v1+(w*Vj),  1) 

where  vt  and  Vj  are  the  values  associated  with  each  stimulus  variable  such 
that  -vm  <  v1(  Vj  <  vB  and  v1-f(ti)  and  Vj-f(dj)  with  the  function  f()  being 
monotic;  and  w  (0<w<l)  is  the  relative  weight  given  to  the  d  dimension. 

The  Decision  Rule  is: 

respond  P,  iff  >  b; 
respond  N,  iff  ClJ  <  b; 
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guess,  iff  CiS  -  b,  where  - ( (w*vB)+vB)<b<( (v+vm)+vm)  .  2) 

The  constant  b  is  a  bias  the  listener  might  have  for  responding  P  (or  N) . 
Therefore,  the  response  space  is  divided  into  the  P  and  N  areas  when  c«  “  »>. 
or 

by  the  line: 

v1+(w*Vj)  -  b;  vt  -  -w*Vj  +  b.  3) 

The  slope  of  the  straight  line  in  the  decision  matrix  determines  the  weight, 
w,  and  the  intercept  of  the  line  determines  the  bias,  b.  In  order  to  construct 
a  response  matrix  (an  ideal  response  matrix  in  this  case)  an  Assignment  Rule 
is  used  such  that  a  value  of  1  is  assigned  to  a  P  response, -1  to  a  N  response, 
and  the  rows  and  columns  of  the  matrix  are  determine  by  i  and  j,  with  -m<i,j<m 
(i.e.  •vi  -  i  and  v_j  -  j).  The  resulting  response  matrix  (with  cells  R^ 
assigned  a  value  of  1  or  -1)  can  be  generated  in  which  the  line  (i--wj+b)  is 
the  best  fitting  line  as  defined  by  equation  3).  The  slope  of  this  line 
determines,  w,  the  relative  weight  assigned  to  the  d  dimension  and  the 
intercept,  b,  determines  the  listener's  bias.  The  best  fitting  line  can  be 
determined  from  a  response  matrix  in  one  of  two  ways  (both  are  equivalent  for 
the  ideal  response  matrix  defined  above  as  explained  below. 

One  rule  includes  the  fact  that  the  best  fitting  line  occurs  when  the 
sum  of  the  C^s  corresponding  to  the  P  response  (all  positive  C^s)  minus  the 
sum  of  the  Cijs  corresponding  to  the  N  response  (all  negative  C^s)  is  maximal, 
that  is: 

m  m  -wj+b  m 

when  E  E  P  E  E  N  is  maximum.  4) 

i--wj+b  j--m  i--m  j--m 

Thus,  the  straight  line  which  maximizes  equation  4)  can  be  determine  from 
real  data  in  order  to  determine  the  slope  (weight)  and  intercept  (bias) . 
Alternatively  a  least  squares  criterion  may  be  used.  That  is,  one  can  find 
that  straight  line  which  provides  the  best  least-squares  fit  to  the  data  as 
defined  by  minimizing  the  sum  nf  the  squared  deviations  between  the  line  and 
the  deviate  responses.  In  order  to  determine  how  well  this  process  might  work 
for  data  that  could  be  obtained  in  a  real  experiment,  we  used  a  Monte  Carlo 
technique  to  evaluate  the  procedure  and  the  outcomes  showed  that  the  technique 
is  very  robust  usually  taking  a  single  block  of  trials  to  yield  a  reliable 
estimate  of  the  slope  and  intercept  of  the  best  fitting  line  as  defined  above. 

The  approach  for  obtaining  an  estimate  of  w  and  b  is  essentially 
distribution- free  with  only  a  few  assumptions  being  required.  The  method  could 
be  adopted  to  a  model  of  how  listeners  might  assign  values  to  the  cells  in  the 
decision  space,  and  a  similar  approach  to  the  one  outlined  above  could  be 
derived  to  evaluate  the  model.  A  number  of  statistics  can  be  obtained  to 
evaluate  how  good  the  best  fitting  line  (as  defined  by  equation  3)  is  to  the 
response  matrix;  for  instance,  1)  the  percent  of  N  and  P  responses 
successfully  divided  into  two  classes  by  the  line  or  2)  the  sum  of  the  squared 
deviations  of  the  deviate  responses  from  the  line.  An  interesting  aspect  of 
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the  approach  Is  the  ability  to  make  the  distractor  the  target  and  the  target 
the  distractor  and  to  repeat  the  procf  Jure  as  a  means  of  testing  the  symmetry 
of  the  measures  obtained.  Also  note  that  the  stimulus  values  (tt  and  dj)  are 
supra- threshold,  allowing  one  to  explore  a  large  stimulus  space. 


Measuring  Modulation  Detection  Interference  With  a  New  Procedure. 

William  A.  Yost 

The  new  procedure  for  measuring  selective  attention  described  elsewhere 
in  this  report  was  used  to  measure  depth  of  modulation  processing  in  a 
modulation  detection  interference  (MDI)  paradigm.  Listeners  were  asked  to 
indicate  whether  a  500-ms  target  tone  (4000-Hz  carrier)  which  was  sinusoidally 
amplitude  modulated  (SAM)  at  16  Hz  had  a  larger  or  smaller  depth  of  modulation 
than  a  standard  SAM  tone  presented  as  a  cue  before  each  trial.  The  standard 
(cue)  was  modulated  with  a  depth  of  modulation  of  -15  dfi  and  the  target 
stimulus  was  modulated  with  depths  of  modulation  of  -7.5,  -9,  -10.5,  -12,  - 
13.5,  -16.5,  -18,  -19.5,  -21  dB  (i.e.  half  with  depths  greater  than  -15  dB  and 
half  with  depths  less  than  -15  dB) .  The  ability  to  make  this  depth 
discrimination  judgment  was  obtained  in  the  three  basic  MDI  conditions:  Target 
Alone  (TA)  condition  in  which  only  the  target  was  presented.  Unmodulated 
Distractor  (UD)  condition  in  which  a  1000-Hz  tone  was  presented  simultaneously 
with  the  target,  and  the  Modulated  Distractor  (MD)  condition  in  which  the 
distractor  was  also  amplitude  modulated  with  the  same  depths  of  modulation  as 
the  target.  The  rates  of  SAM  for  the  distractor  where  either  the  same  as  the 
target,  16  Hz,  or  different,  8  or  32  Hz.  For  three  of  the  listeners  their 
weights  for  the  distractor  dimension  estimated  from  the  best  fitting  line  to 
the  response  matrix  as  defined  for  the  new  procedure  for  measuring  selective 
attention  were  all  0.0  for  the  TA  condition  and  0.0,  0.0,  and  0.1  for  the  UD 
condition,  and  then  rose  to  0.9,  1.1,  and  0.6  for  the  MD  condition  when  the 
distractor  and  target  were  modulated  at  the  same  rate  (16  Hz) .  These  results 
are  consistent  with  the  general  MDI  finding  in  that  listeners  are  analytic  in 
the  PT  and  UD  conditions  but  synthetic  when  the  distractor  and  target  are 
modulated  together  at  the  same  rate  in  the  MD  condition.  The  fourth  listener 
had  a  weight  of  0.0  in  the  PT  condition  but  weights  of  greater  than  1  for  the 
UD  condition  (3.3)  and  the  MD  condition  (2.6),  indicating  that  in  these 
conditions  she  was  using  the  depth  of  modulation  of  the  distractor  and  not  the 
target  despite  the  fact  that  feedback  was  provided  on  each  trial  consistent 
with  target  modulation  depth.  She  was  behaving  somewhat  analytic  (3.3  is  like 
a  0.3  weight  and  2.6  is  like  a  0.4  weight)  but  she  was  responding  as  if  she 
could  not  hear  the  target  (she  could  hear  it  since  she  performed  as  the  other 
listeners  did  in  the  TA  condition,  but  when  both  the  distractor  and  target 
were  present  she  responded  as  if  she  was  largely  ignoring  the  target).  Thus, 
the  new  procedure  allows  for  a  robust  metric  for  determining  individual 
differences  based  on  the  concept  of  how  listeners  assign  weights  to  the 
distractor  dimension.  All  listeners  became  more  analytic  (weights  toward  0.0) 
when  the  distractor  was  modulated  at  a  different  rate  than  the  target:  at  8  Hz 
the  weights  were  0.1,  0.3,  0.0,  0.2  and  at  32  Hz  they  were  0.0,  0.2,  0.0.,  and 
0.3.  The  bias  of  the  listeners  was  always  with  +  1,  indicating  very  little 
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response  bias  by  these  listeners.  Again  these  results  are  consistent  with  the 
description  of  listener  performance  in  modulation  detection  or  discrimination 
in  MD1  paradigms .  The  new  selective  attention  paradigm  allows  for  a 
description  of  performance  that  is  more  valid  in  terms  of  describing  how 
listeners  hear  out  targets  in  the  presence  of  distractors  and  the  procedure 
captures  individual  differences  in  a  more  meaningful  manner  than  measures  of 
thresholds  obtained  in  the  traditional  MDI  procedures. 
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